1,662 research outputs found
Using natural language processing to improve biomedical concept normalization and relation mining
This thesis concerns the use of natural language processing for improving biomedical concept normalization and relation mining. We begin with introducing the background of biomedical text mining, and subsequently we will continue by describing a typical text mining pipeline, some key issues and problems in mining biomedical texts, and the possibility of using natural language procesing to solve the problems. Finally we end an outline of the work done in this thesis
Training text chunkers on a silver standard corpus: Can silver replace gold?
Background: To train chunkers in recognizing noun phrases and verb phrases in biomedical text, an annotated corpus is required. The creation of gold standard corpora (GSCs), however, is expensive and time-consuming. GSCs therefore tend to be small and to focus on specific subdomains, which limits their usefulness. We investigated the use of a silver standard corpus (SSC) that is automatically generated by combining the outputs of multiple chunking systems. We explored two use scenarios: one in which chunkers are trained on an SSC in a new domain for which a GSC is not available, and one in which chunkers are trained on an available, although small GSC but supplemented with an SSC.Results: We have tested the two scenarios using three chunkers, Lingpipe, OpenNLP, and Yamcha, and two different corpora, GENIA and PennBioIE. For the first scenario, we showed that the systems trained for noun-phrase recognition on the SSC in one domain performed 2.7-3.1 percenta
Recommended from our members
Prototyping to elicit user requirements for product development: Using head-mounted augmented reality when designing interactive devices
Data availability:
I have shared the data in the paper.Copyright © 2022 The Author(s). Designers of interactive devices are challenged by the need to accurately elicit user requirements from low-cost prototypes at the early stages of the design process. Head-mounted augmented reality (AR) can potentially assist in this process by economically representing physical-digital blended features with relatively high-fidelity prototypes. To explore this potential, we present and evaluate a head-mounted AR-enhanced hybrid prototyping system created in the context of a fan product development process. We conducted a mixed-methods study comparing the AR-enhanced prototyping method with a conventional prototyping method. The results reveal that the AR system can elicit similar user requirements as the conventional prototyping method with an improved overall experience
ContextD: An algorithm to identify contextual properties of medical terms in a dutch clinical corpus
Background: In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus. Results: The ContextD algorithm utilized 41 unique triggers to identify the contextual properties in the clinical corpus. For the negation property, the algorithm obtained an F-score from 87% to 93% for the different document types. For the experiencer property, the F-score was 99% to 100%. For the historical and hypothetical values of the temporality property, F-scores ranged from 26% to 54% and from 13% to 44%, respectively. Conclusions: The ContextD showed good performance in identifying negation and experiencer property values across all Dutch clinical document types. Accurate identification of the temporality property proved to be difficult and requires further work. The anonymized and annotated Dutch clinical corpus can serve as a useful resource for further algorithm development
Knowledge-based extraction of adverse drug events from biomedical text
Background: Many biomedical relation extraction systems are machine-learning based and have to be trained on large annotated corpora that are expensive and cumbersome to construct. We developed a knowledge-based relation extraction system that requires minimal training data, and applied the system for the extraction of adverse drug events from biomedical text. The system consists of a concept recognition module that identifies drugs and adverse effects in sentences, and a knowledg
Using rule-based natural language processing to improve disease normalization in biomedical text
Background and objective: In order for computers to extract useful information from unstructured text, a concept normalization system is needed to link relevant concepts in a text to sources that contain further information about the concept. Popular concept normalization tools in the biomedical field are dictionarybased. In this study we investigate the usefulness of natural language processing (NLP) as an adjunct to dictionary-based concept normalization. Methods: We compared the performance of two biomedical concept normalization systems, MetaMap and Peregrine, on the Arizona Disease Corpus, with and without the use of a rule-based NLP module. Performance was assessed for exact and inexact boundary matching of the system annotations with those of the gold standard and for concept identifier matching. Results: Without the NLP module, MetaMap and Peregrine attained F-scores of 61.0% and 63.9%, respectively, for exact boundary matching, and 55.1% and 56.9% for concept identifier matching. With the aid of the NLP module, the F-scores of MetaMap and Peregrine improved to 73.3% and 78.0% for boundary matching, and to 66.2% and 69.8% for concept identifier matching. For inexact boundary matching, performances further increased to 85.5% and 85.4%, and to 73.6% and 73.3% for concept identifier matching. Conclusions: We have shown the added value of NLP for the recognition and normalization of diseases with MetaMap and Peregrine. The NLP module is general and can be applied in combination with any concept normalization system. Whether its use for concept types other than disease is equally advantageous remains to be investigated
Probing Shadowed Nuclear Sea with Massive Gauge Bosons in the Future Heavy-Ion Collisions
The production of the massive bosons and could provide an
excellent tool to study cold nuclear matter effects and the modifications of
nuclear parton distribution functions (nPDFs) relative to parton distribution
functions (PDFs) of a free proton in high energy nuclear reactions at the LHC
as well as in heavy-ion collisions (HIC) with much higher center-of mass
energies available in the future colliders. In this paper we calculate the
rapidity and transverse momentum distributions of the vector boson and their
nuclear modification factors in p+Pb collisions at TeV and in
Pb+Pb collisions at TeV in the framework of perturbative QCD
by utilizing three parametrization sets of nPDFs: EPS09, DSSZ and nCTEQ. It is
found that in heavy-ion collisions at such high colliding energies, both the
rapidity distribution and the transverse momentum spectrum of vector bosons are
considerably suppressed in wide kinematic regions with respect to p+p reactions
due to large nuclear shadowing effect. We demonstrate that in the massive
vector boson productions processes with sea quarks in the initial-state may
give more contributions than those with valence quarks in the initial-state,
therefore in future heavy-ion collisions the isospin effect is less pronounced
and the charge asymmetry of W boson will be reduced significantly as compared
to that at the LHC. Large difference between results with nCTEQ and results
with EPS09 and DSSZ is observed in nuclear modifications of both rapidity and
distributions of and in the future HIC.Comment: 13 pages, 21 figures, version accepted for publication in Eur. Phys.
J.
The CALBC Silver Standard Corpus for Biomedical Named Entities - A Study in Harmonizing the Contributions from Four Independent Named Entity Taggers
The production of gold standard corpora is time-consuming and costly. We propose an alternative: the 'silver standard corpus' (SSC), a corpus that has been generated by the harmonisation of the annotations that have been delivered from a selection of annotation systems. The systems have to share the type system for the annotations and the harmonisation solution has use a suitable similarity measure for the pair-wise comparison of the annotations. The annotation systems have been evaluated against the harmonised set (630.324 sentences, 15, 956, 841 tokens). We can demonstrate that the annotation of proteins and genes shows higher diversity across all used annotation solutions leading to a lower agreement against the harmonised set in comparison to the annotations of diseases and species. An analysis of the most frequent annotations from all systems shows that a high agreement amongst systems leads to the selection of terms that are suitable to be kept in the harmonised set. This is the first large-scale approach to generate an annotated corpus from automated annotation systems. Further research is required to understand, how the annotations from different systems have to be combined to produce the best annotation result for a harmonised corpus
MCL-CAw: A refinement of MCL for detecting yeast complexes from weighted PPI networks by incorporating core-attachment structure
Abstract Background The reconstruction of protein complexes from the physical interactome of organisms serves as a building block towards understanding the higher level organization of the cell. Over the past few years, several independent high-throughput experiments have helped to catalogue enormous amount of physical protein interaction data from organisms such as yeast. However, these individual datasets show lack of correlation with each other and also contain substantial number of false positives (noise). Over these years, several affinity scoring schemes have also been devised to improve the qualities of these datasets. Therefore, the challenge now is to detect meaningful as well as novel complexes from protein interaction (PPI) networks derived by combining datasets from multiple sources and by making use of these affinity scoring schemes. In the attempt towards tackling this challenge, the Markov Clustering algorithm (MCL) has proved to be a popular and reasonably successful method, mainly due to its scalability, robustness, and ability to work on scored (weighted) networks. However, MCL produces many noisy clusters, which either do not match known complexes or have additional proteins that reduce the accuracies of correctly predicted complexes. Results Inspired by recent experimental observations by Gavin and colleagues on the modularity structure in yeast complexes and the distinctive properties of "core" and "attachment" proteins, we develop a core-attachment based refinement method coupled to MCL for reconstruction of yeast complexes from scored (weighted) PPI networks. We combine physical interactions from two recent "pull-down" experiments to generate an unscored PPI network. We then score this network using available affinity scoring schemes to generate multiple scored PPI networks. The evaluation of our method (called MCL-CAw) on these networks shows that: (i) MCL-CAw derives larger number of yeast complexes and with better accuracies than MCL, particularly in the presence of natural noise; (ii) Affinity scoring can effectively reduce the impact of noise on MCL-CAw and thereby improve the quality (precision and recall) of its predicted complexes; (iii) MCL-CAw responds well to most available scoring schemes. We discuss several instances where MCL-CAw was successful in deriving meaningful complexes, and where it missed a few proteins or whole complexes due to affinity scoring of the networks. We compare MCL-CAw with several recent complex detection algorithms on unscored and scored networks, and assess the relative performance of the algorithms on these networks. Further, we study the impact of augmenting physical datasets with computationally inferred interactions for complex detection. Finally, we analyse the essentiality of proteins within predicted complexes to understand a possible correlation between protein essentiality and their ability to form complexes. Conclusions We demonstrate that core-attachment based refinement in MCL-CAw improves the predictions of MCL on yeast PPI networks. We show that affinity scoring improves the performance of MCL-CAw.http://deepblue.lib.umich.edu/bitstream/2027.42/78256/1/1471-2105-11-504.xmlhttp://deepblue.lib.umich.edu/bitstream/2027.42/78256/2/1471-2105-11-504-S1.PDFhttp://deepblue.lib.umich.edu/bitstream/2027.42/78256/3/1471-2105-11-504-S2.ZIPhttp://deepblue.lib.umich.edu/bitstream/2027.42/78256/4/1471-2105-11-504.pdfPeer Reviewe
Constraints on Spin-Independent Nucleus Scattering with sub-GeV Weakly Interacting Massive Particle Dark Matter from the CDEX-1B Experiment at the China Jin-Ping Laboratory
We report results on the searches of weakly interacting massive particles
(WIMPs) with sub-GeV masses () via WIMP-nucleus spin-independent
scattering with Migdal effect incorporated. Analysis on time-integrated (TI)
and annual modulation (AM) effects on CDEX-1B data are performed, with 737.1
kgday exposure and 160 eVee threshold for TI analysis, and 1107.5
kgday exposure and 250 eVee threshold for AM analysis. The sensitive
windows in are expanded by an order of magnitude to lower DM masses
with Migdal effect incorporated. New limits on at
90\% confidence level are derived as 1010
for TI analysis at 50180 MeV/, and
1010 for AM analysis at
75 MeV/3.0 GeV/.Comment: 5 pages, 4 figure
- …